Smoothing Multivariate Performance Measures

نویسندگان

  • Xinhua Zhang
  • Ankan Saha
  • S. V. N. Vishwanathan
چکیده

A Support Vector Method for multivariate performance measures was recently introduced by Joachims (2005). The underlying optimization problem is currently solved using cutting plane methods such as SVM-Perf and BMRM. One can show that these algorithms converge to an accurate solution in O ( 1 λ ) iterations, where λ is the trade-off parameter between the regularizer and the loss function. We present a smoothing strategy for multivariate performance scores, in particular precision/recall break-even point and ROCArea. When combined with Nesterov’s accelerated gradient algorithm our smoothing strategy yields an optimization algorithm which converges to an accurate solution in O∗ ( min { 1 , 1 √ λ }) iterations. Furthermore, the cost per iteration of our scheme is the same as that of SVM-Perf and BMRM. Empirical evaluation on a number of publicly available datasets shows that our method converges significantly faster than cutting plane methods without sacrificing generalization ability. 1 Background and Introduction Different kinds of applications served by machine learning algorithms have varied and specific measures to judge the performance of the algorithms. In this paper we focus on efficient algorithms for directly optimizing multivariate performance measures such as precision/recall break-even point (PRBEP) and area under the Receiver Operating Characteristic curve (ROCArea). Given a training set with n examples X := {(xi, yi)}i=1 where xi ∈ R and yi ∈ {+1,−1}, Joachims (2005) proposed an elegant formulation for this problem which minimizes the following regularized risk: min w J(w) = λ 2 ‖w‖ +Remp(w). (1) Here 12 ‖w‖ 2 is the regularizer, λ > 0 is a trade-off parameter and the empirical risk Remp for contingency table based multivariate performance measures is Remp(w)= max z∈{−1,1} [

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Models and Methods for Spatial Data: Applications in Epidemiological, Environmental and Ecological Studies

This thesis develops new methodologies for applied problems using smoothing techniques for spatial or spatial temporal data. We investigate Bayesian ranking methods for identifying high risk areas in disease mapping, assessing these particularly with regard their performance in isolating emerging unusual and extreme risks in small areas. We build on information obtained through mapping multivar...

متن کامل

Hazards of Digital Smoothing Filters as a Preprocessing Tool in Multivariate Calibration

The efficacy of smoothing first-order data as a preprocessing method for multivariate calibration is discussed. In particular, the use of symmetric smoothing filters (such as Savitzky–Golay filters) is examined from the perspective of calibration performance, in contrast with past studies based on univariate signal-to-noise improvement. It is shown mathematically that in the limit of a perfect ...

متن کامل

Multivariate exponential smoothing for forecasting tourist arrivals to Australia and New Zealand

In this paper we propose a new set of multivariate stochastic models that capture time varying seasonality within the vector innovations structural time series (VISTS) framework. These models encapsulate exponential smoothing methods in a multivariate setting. The models considered are the local level, local trend and damped trend VISTS models with an additive multivariate seasonal component. W...

متن کامل

Using statistical smoothing to date medieval manuscripts∗

We discuss the use of multivariate kernel smoothing methods to date manuscripts dating from the 11th to the 15th centuries, in the English county of Essex. The dataset consists of some 3300 dated and 5000 undated manuscripts, and the former are used as a training sample for imputing dates for the latter. It is assumed that two manuscripts that are “close”, in a sense that may be defined by a ve...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2011